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Cosmological simulations are crucial tools in studying the Universe, 
but they typically do not directly match real observed struc- 
tures. Constrained cosmological simulations, on the other hand, 
are designed to match the observed distribution of galaxies. Here, 
we present constrained simulations based on spectroscopic surveys 
at a redshift of z ~ 2.3, corresponding to an epoch of nearly 11 
Gyrs ago. This allows us to “fast-forward” the simulation to our 
present-day and study the evolution of observed cosmic structures 
self-consistently. We confirm that several previously-reported pro- 
to clusters will evolve into massive galaxy clusters by our present 
epoch, including the ’Hyperion’ structure that we predict will col- 
lapse into a giant filamentary supercluster spanning 100 Mega- 
parsecs. We also discover previously unknown protoclusters, with 
lower final masses than typically detectable by other methods, that 
nearly double the number of known protoclusters within this vol- 
ume. Constrained simulations, applied to future high-redshift data 
sets, represents a unique opportunity for studying early structure 
formation and matching galaxy properties between high and low 
redshifts. 


Introduction 


Understanding the formation of large-scale structures in the Universe, starting from 
tiny fluctuations in the matter density, and subsequently evolving gravitationally into 
the complex cosmic web seen at the present epoch, is a key ambition of cosmological 
science. The formation and evolution of galaxy clusters throughout different cosmic 
epochs is a crucial probe of the current cosmological ACDM concordance model (see 
[1] for a review), and is studied with a wide range of observational techniques. 

The collapse time of massive galaxy clusters is of the order of the Hubble time, 
thus we expect their formation to typically complete at redshifts z < 1. At higher 
redshifts z = 1.5, we usually do not observe collapsed massive galaxy clusters but 
instead extended accumulations of galaxies that do not yet form bound structures 
[2] or only a collapsed core of the whole structure [3, 4] 

These diffuse formations in the evolving cosmic web [5, 6], that correspond to 
Lagrangian volumes of V > (10 h~! Mpc)? [7], are called galaxy protoclusters (see 
[8] for a review) and represent the progenitors of galaxy clusters seen in the Local 
Universe. As gravitationally evolving objects, protoclusters are ideal observables to 
study early structure formation and to compare to theoretical predictions. Moreover, 
they are active sites of star and galaxy formation during the z ~ 2 — 3 Cosmic Noon 
epoch [9], which makes them excellent laboratories to jointly study the interplay 
between baryonic physics and dark matter models. 

Observationally, the effort to find and characterize protoclusters is a lively, ongo- 
ing field. In particular, the COSMOS field [10] is an excellent site for this, as it is 
covered by deep and coordinated multi-wavelength observations over a wide field 


Springer Nature 2021 ‘TEX template 


Fate of COSMOS Protoclusters 3 


(> 1 square degree, corresponding to > 100 Mpc in the transverse extent) suited to 
protocluster studies. 

In addition, surveys using Lya forest tomography — which probes the large-scale 
diffuse hydrogen distribution — have targeted parts of this field [11-14], offering a 
complementary way to study structure formation apart from galaxy surveys. Mul- 
tiple studies have identified protoclusters in COSMOS with various claims they are 
potentially progenitors of galaxy clusters such as Coma or Virgo, with present-day 
masses up to ~ 101° Mo [15-26]. However, these estimates were done with different 
methods that may not be consistent with each other, and typically did not take into 
account the large-scale structure environment over > 10 not Mpc, which is known to 
determine the evolution of the protoclusters [2, 7, 8]. Also, studies show that den- 
sity peaks at high redshifts do not necessarily collapse into massive galaxy clusters 
at z = 0. In turn, massive structures at z = 0 do not always originate from single 
high redshift density peaks [27]. Up to this point, there has not been a uniform and 
self-consistent study dedicated to these structures in the COSMOS field. 

We address this problem with constrained simulations applied towards the rich 
legacy of large-scale spectroscopic surveys that have been conducted on the COSMOS 
field over nearly a decade, achieving a cosmic volume and number density unmatched 
anywhere else on the sky. Standard cosmological simulations start from randomly 
created Gaussian matter density fluctuations and produce structures that are statisti- 
cally consistent with the observable Universe, but are not intended to directly match 
actual observed structures. On the other hand, constrained cosmological simulations 
are designed to reproduce specific large-scale structures in observations. However, 
hitherto, most applications have mainly focused on the Local Universe or nearby 
structures [28-32]. By performing cosmological simulations constrained from high- 
redshift data (z > 2), we can potentially probe epochs when cosmic structure growth 
was still in the quasi-linear regime. This allows a more direct mapping between the 
initial conditions and the observed galaxy distribution than feasible in the strongly 
non-linear regime of later epochs. Moreover, it opens up the exciting possibility of 
forward-modeling the evolution of the observed high-redshift structures to the present 
time, to predict whether or not they will collapse into massive cluster halos. 


Constrained Simulations of the COSMOS Field 


For our analysis, we use initial density fluctuations at z = 100 that would evolve 
into a matter density field consistent with the 3D distribution of a large sample of 
spectroscopically-confirmed galaxies at Cosmic Noon [33]. This sample was compiled 
from the zCOSMOS-Deep [34], VUDS [35] MOSFIRE [36], and ZFIRE [15] redshift 
surveys over the redshift range of 1.4 < 2° < 3.6 and lie within the central square 
degree of the COSMOS field centered on R.A.= 150.1°, Dec= 2.2° in the J2000 
ICRS frame. In cosmology, the redshift is often used as a proxy for both cosmological 
distance and cosmological time, which in our case is generally disconnected. To dis- 
ambiguate, we use 28 when referring to observed distance or equivalent comoving 
line-of-sight position in our simulations, and z as the time label for the simulation. 

Special care was taken to accurately calculate the selection functions of the indi- 
vidual surveys to ensure an unbiased estimate across the different surveys [33]. We 
then applied a multi-survey adaptation [37] of the COSMIC BIRTH algorithm [38], 
which is a nested Bayesian inference algorithm for estimating the initial conditions 
and underlying matter density field. The reconstruction grid was chosen to be 2563 
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Fig. 1 Matter density field of the COSMOS volume in the observed redshift range of 
2.00 < z°bS < 2.52, from one realization out of 50 constrained simulations. Top & Middle: 
A z = 2.3 snapshot of our constrained simulations. The highlighted area on the top panel 
indicates the observed region covered by the four galaxy surveys. Black dots show the galaxy 
positions. Black crosses show the Literature-Reported Protocluster Candidates (LRPCs), 
while the blue diamonds correspond to the center of mass position of protoclusters seen 
in COSTCO. The white lines show which LRPC is matched to each protocluster in a r = 
15 h~1 Mpc search radius. Bottom: The density field of the z = 0 snapshot. The white 
circles indicate the halos that were successfully matched with LRPCs while blue circles show 
unmatched halos at z = 0. In both slices, the ordinate axis approximately corresponds to 
the Declination dimension in sky coordinates. 


cells with a side length of Ly, = 512 not Mpc, which is large enough to cover 
the comoving line-of-sight extent represented by 2.00 < gobs < 2.52 in COSMOS. 
This redshift range was specifically selected to overlap with multiple protoclusters 
previously reported in the literature [15, 16, 18—25]). 

From the posterior sample of initial conditions, we randomly selected 50 realiza- 
tions to seed the simulations which were then run using the PKDGRAV3 N-body 
code [39]. We dub the corresponding simulation suite as “COnstrained Simulations 
of The COsmos field" (COSTCO). 

The COSTCO multiverse simulations were run from the initial conditions at 
z = 100 until z = 0, with 6 intermediate time snapshots being output at z = 2.0—2.5. 
Figure 1 shows projected slices of the matter density field from one of the constrained 
realizations, centered at approximately R.A.=149.99° in the COSMOS field. We also 
show as fine dots the observed galaxy positions in COSMOS, which trace the z = 2.3 
matter density field as expected. 

Evolved to z = 0, we see that the fuzzy and diffuse matter distributions at z = 2.3 
have collapsed into much larger structures with higher density contrast. 

The different realizations show variation in the resulting z = 0 filamentary struc- 
tures (see Figure 6 in the Appendix), but massive clusters form consistently in 
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Fig. 2 Two examples of massive Myi, > 2 x 10!4 p-1 Mo halos in our constrained sim- 
ulations. The z = 0 center of the virialized halo is shown as a red circle, with the member 
dark matter particles shown as red dots. The blue dots show the same dark matter particle 
positions at the z = 2.3 snapshot. Additionally, we mark the z = 2.3 center of mass position 
of the dark matter particles with a cyan diamond. The coordinate axes are the same as in 
the matter density plot shown in Figure 1. 


positions corresponding to large overdensities at z = 2.3. An ensemble of simula- 
tions like COSTCO, with consistent large-scale structure and fluctuations on smaller 
scales is called Local Ensemble Statistics in the literature and was first studied in [40]. 
Next, we identified collapsed halos with the ROCKSTAR halo finder [41], that have 
a minimum virial mass of Myi; > 2 x 10!4 7-1 Mo at z = 0, which is approximately 
equivalent to the mass of the Virgo cluster, the nearest massive cluster to the Milky 
Way. We then traced the particles of each halo back to their Lagrangian positions at 
the nearest time snapshot to their observed redshift at 2.00 < goes < 2.52, and com- 
puted the center-of-mass position of the protocluster. Examples of this are shown 
in Figure 2, which illustrates the extended (= 20 ite Mpc) nature of galaxy proto- 
clusters. It is interesting to note that the protocluster center-of-mass can experience 
significant displacements (~ 10 h~! Mpc) as it evolves over 11 Gyr from z = 2.3 
to z = 0. Indeed, in the case of the lower-mass cluster, the final halo position is at 
an Eulerian coordinate almost outside the envelope of its z ~ 2.3 progenitor parti- 
cles. This illustrates the crucial role that complex gravitational interactions across 
the large-scale cosmic web plays in protocluster evolution, such that oversimplified 
modeling with e.g. linear theory can lead to inaccurate results. The variance across 
different realizations in the mature z = 0 masses and Lagrangian center-of-mass 
coordinates at z = 2.3 allow us to calculate uncertainties on the position and final 
mass for each identified protocluster. 


Results 


To compare with observed protoclusters in the literature (which we term Literature- 
Reported Protocluster Candidates or ‘LRPCs’), we convert the reported coordinates 
for each LRPC into equivalent Cartesian coordinates in our simulation grid, where 
the observed redshift, gers, is converted into a line-of-sight comoving distance using 
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a fiducial Planck 2018 cosmology [42]. We then search for the center-of-mass coordi- 
nates of COSTCO protoclusters within a spherical aperture of radius 15 not Mpc. 
The following are our constrained simulations results for the LRPCs (which are also 
summarized in Table 1): 


ZFOURGE/ZFIRE: This protocluster was first identified through medium- 
band imaging [15], and subsequently confirmed through spectroscopic follow-up 
at z = 2.095 [16]. We find a matched protocluster in all our realizations and 
predict it to evolve into a Coma-like cluster halo of Myi, = (1.2 + 0.3) x 
10°97 A" Mo. The COSTCO simulation ensemble predicts that the center- 
of-mass will experience only very small displacements along the line-of-sight 
between z = 2.1 and z = 0, but there will be a significant systematic displace- 
ment toward lower declination values at z = 0. This suggests the presence of 
another significant overdensity to the South of the current data sample, which 
we will follow-up in a separate paper. 





Hyperion: Various authors had successively discovered protoclusters at 
2.423 < 2° < 2.507, across a region spanning approximately A@ = 0.5° on 
the sky [18-22]. It was then argued by [23] that these protoclusters likely form a 
connected structure, dubbed ‘Hyperion’, with seven distinct density peaks that 
will evolve into a super-cluster by z = 0, with a total final mass of 4.8 x 10°" Mo. 
COSTCO successfully identifies massive cluster halos at the reported position 
of Hyperion in all the realizations. Our constrained simulations show that a 
partial merging of Hyperion is likely by z = 0, but we never witnessed a com- 
plete merging of all the constituent peaks to form one cluster only. Instead, 
COSTCO indicates that, on average, four virialized clusters will coalesce out 
of Hyperion to form a massive filamentary group of clusters with an aggregate 
mass of Myr = (2.50.5) x 10!° h~! Mo within collapsed cluster halos, span- 
ning over a distance of dyyp = (65 + 10) h—'Mpc. The most massive halo is 
found to have a virial mass of Myi,(z = 0) = (1.25+0.35) x 10'°h7! Mo. The 
final collapsed structure is approximately equivalent in spatial extent and mass 
to the Coma/A1367 filament in the Local Universe [43]. Another analog is the 
elongated supercluster core of the Sloan Great Wall structure [44], although the 
limited volume of the current COSMOS data does not permit us to investigate 
whether the analogy holds to the full scale of the Great Wall. Examples of 3D 
visualizations of Hyperion are shown in Figure 3 for 4 different realizations. 











CC2.2: In 42 out of 50 cases, we find a cluster of Myi;(z = 0) = (4.2+1.9) x 
io Mo eventually forming out of an overdensity very close to the CC2.2 
protocluster [24]. Our mean final halo mass for CC2.2 is less massive than 
estimated by [24], although the tension is relatively weak at only 1.440 and 
could easily be due to the different methods and data sets. 


G237: We do not find a consistently forming cluster that arises from the 
reported position of the recently discovered G237 protocluster, which was ini- 
tially detected as a far-IR excess in the Planck satellite data and is reported 
to have Myir(z = 0) ~ 3x oh Mo. A closer inspection of our galaxy 
data shows only 2 galaxies in a sphere of 5 arcmin radius centered at R.A = 
150.507°, Dec = 2.31204°, and z°>S = 2.16, where G237 is observed. We ana- 
lyzed the same region in our z = 2.3 snapshots and searched for z = 0 halos 
with Myir > 2 x 10/4 p-! Mo that had their origin in this area. Throughout 
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Fig. 3 3D visualizations of the Hyperion supercluster filament for 4 realizations at z = 0, 
showing that the filamentary structure occurs consistently within our ensemble of realiza- 
tions. The dots indicate the comoving positions of the observed z°>S = 2.4 — 2.5 galaxies 
that were used in the reconstructions. 


the 50 realizations, this only happens twice within the observed volume. How- 
ever, we note that G237 is reported at a position that is near the margins of our 
observed region and covered by only the zCOSMOS-Deep survey, i.e. a region 
where we do not expect strong constraints. Nevertheless, our constrained sim- 
ulations only disfavor a protocluster at the 1.750 level which is not significant 
especially considering the low mass reported by [25]. 


CCPC-z22-006: At the reported coordinates by [17], namely R.A. = 149.93°, 
Dec = 2.2°, and an observed redshift of 2°?’ = 2.283, we cannot confirm a 
consistently forming protocluster. In the z = 2.3 snapshot, we can only see in 2 
out of 50 cases progenitor particles corresponding to a z = 0 cluster within the 
search radius. However, in these cases the particles are associated with a nearby 
forming protocluster 25 hot Mpc away, which we will discuss below. Also, in 
contrast to G237, CCPC-z22-006 lies within the well observed region of the 
used galaxy surveys and therefore, our analysis tends to disfavor a protocluster 
at the exact position of CCPC-z22-006. 


Apart from the LRPCs, we also find several protoclusters in the COSMOS field 
that are, to our knowledge, newly-discovered through COSTCO. These typically do 
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Name [Ref.| R.A. [deg] Dec [deg] gobs Final Mass [h~! Mo| 
ZFOURGE/ZFIRE [15, 16] 150.094 2.251 2.095 (1.2 £0.3) x 10% 
G237* [25] 150.507 2.312 216 —— 
CC2.2* [24] 150.197 2.003 2.232 (4.2£1.9) x 10% 
CCPC-z22-006 [17] 149.930 2.200 2.283. —— 

150.093 2.404 2.468 

149.976 2.112 2.426 

149.999 2.253 2.444 
Hyperion [18-23] 150.255 2.342 2.469 (2.5 +0.5) x 1015 

150.229 2.338 2.507 

150.331 2.242 2.492 

149.958 2.218 2.423 
COSTCO J100026.4+020940  ~—*:150.110 2.161 2.298 (4.64 2.2) x 10 
COSTCO J095924.0+021435 149.871 2.229 2.047 (6.1£2.5) x 10 
COSTCO J100031.0+021630 ‘150.129 2.275 2.160 (5.34 2.6) x 10% 
COSTCO J095849.4+020126 —:149.706 2.024 2.391 (6.6 £2.3) x 10™ 
COSTCO J095945.1+020528  ~—«:149.938 2.091 2.283 (4.32.4) x 10™ 








Table 1 COSMOS field protoclusters in the redshift range of 2 < z°P§ < 2.52. For 
protoclusters previously reported in literature, we report the published coordinates but the 
final mass is from our analysis. An asterisk (*) denotes that our study does not include the 
recent spectroscopic follow-up observational data used for protoclusters G237 and CC2.2. 
The dashed line (——) denotes that our analysis does not find a z = 0 halo consistently 
forming at the reported position. The lower part of the table summarizes the protoclusters 
found in this work with the z = 2.3 center-of-mass positions and the estimated z = 0 final 
masses. 


not show up as strong overdensities at 2.00 < goes < 2.52, but rather as extended 
milder overdensities that nevertheless contain sufficient enclosed mass to collapse 
into clusters by z = 0. This is a new class of extended lower-mass protoclusters — 
with Virgo-like final masses of < 10"* Mo — that have previously eluded detec- 
tions but is now revealed by our combination of large-scale spectroscopic data and 
constrained simulation modelling. Since the observational data has slightly less con- 
straining power at these milder overdensities, this leads to larger relative errors in 
the derived masses. Nevertheless, all the newly-discovered protoclusters are robust 
~ 5a detections that show up in the majority of the constrained realizations: 


COSTCO J100026.4+-020940: In 48 out of 50 cases, we find a cluster pro- 
genitor forming at R.A. = 150.110°+0.042°, Dec = 2.161°+0.040°, and z°>S = 
2.298 + 0.007, with a final mass of Myi,(z = 0) = (4.6+2.2) x 10!4A7! Mo. An 
overdensity of galaxies was first noted at this location by [19], but they did not 
pursue a more detailed analysis. This halo forms ~ 28 h~'Mpe away from the 
reported position of CCPC-z22-006. In the z = 2.3 snapshot, we noticed that 
the progenitor particles corresponding to COSTCO J100014.4+020748 and 
CCPC-z22-006 were partially overlapping in 2 out of these 48 cases. However, 
the resulting cluster emerged at the position of COSTCO J100014.4+020748. 














COSTCO J095924.0+021435: In 42/50 realizations we find a protocluster 
at R.A. = 149.871° + 0.055°, Dec = 2.229° + 0.069°, at a distance of 2° = 
2.047 + 0.008. The final mass is Myi,(z = 0) = (6.1 + 2.5) x 10'4h7! Mo. 
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Fig. 4 Top: Confidence contours of the identified protoclusters (center-of-mass positions) 
with the DBSCAN method for all 50 realizations stacked into the same volume. ZFIRE and 
Hyperion show clearly the strongest recurrence, while the contours for COSTCO protoclus- 
ters are less distinct. Bottom: Averaged density field of all 50 COSTCO z = 2.3 snapshots. 
The galaxy positions are superimposed with back dots. The mean density and protocluster 
contours agree well with the galaxy distribution at z = 2.3. 


COSTCO J100031.0+021630: In 49/50 cases we see a cluster progenitor 
forming at R.A. = 150.129° + 0.051°, Dec = 2.275° + 0.082°, and z°>S = 
2.160 + 0.009. We find a z = 0 mass of Myiz = (5.3 + 2.6) x 10!4A71 Mo. 








COSTCO J095849.4+020126: We find a consistently forming protocluster 
at R.A. = 149.706° + 0.053°, Dec = 2.024° + 0.081°, and observed redshift of 
2S — 2.391 + 0.008 in 46 out of 50 simulations, with a final mass of My;,(z = 
0) = (6.6 + 2.3) x 10'4A7! Mo. 








COSTCO J095945.1+020528: We find a protocluster slightly to the South 
of Hyperion, which forms as a separate cluster in 27 out of 50 cases at 
R.A. = 149.938° +0.105°, Dec = 2.091° +0.081°, and z°S = 2.495+0.006. Our 
simulations predict an average mass of My;,(z = 0) = (4.342.4)x 104 h7! Mo. 
Closer inspection reveals that even though in 40 realizations at 20S — 2.48 
a clear overdensity forms in this vicinity, a cluster emerges only in 27 cases 
whereas in the other cases the overdensity is tidally disrupted by the gravita- 
tional drag of Hyperion. This cluster can potentially become a substructure of 
the Hyperion supercluster. 











We show the corresponding confidence map of the identified protoclusters in the top 
panel of Figure 4 , numbered in the same order as in Table 1. The bottom panel 
of Figure 4 shows the mean density of all 50 COSTCO z = 2.3 snapshots with the 
galaxy positions overplotted as black dots. 


Summary & Discussion 


Based only on analysis of existing large-volume spectroscopic surveys, we have 
increased the number of known galaxy protoclusters at 2 < z < 2.5 in the COSMOS 
field by 1.5-2x, while pushing down to final masses of Myi,(z = 0)  4x1014A7! Mo. 
This represents a lower mass detection limit than hitherto feasible with survey data 
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sets, from which protocluster detections were typically in the range of Myi,(z = 0) ~ 
io"? A" Mo [8]. This ability to detect and simultaneously model more representa- 
tive lower-mass protoclusters with large-volume surveys will allow a more uniform 
characterization of their demographics. In addition to the new objects, we have con- 
firmed and directly modeled the fate of several previously-known galaxy protoclusters 
in the volume. The Hyperion aggregation [23], in particular, had been the subject 
of scientific and public curiosity: its elongated, extended nature made it difficult to 
ascertain whether it would collapse into a singular massive cluster by z = 0. Our 
results clearly show that it will not, and instead points to a fate as multiple massive 
cluster cores embedded within a giant filamentary supercluster. 

These results illustrate the efficacy of constrained simulations applied to high- 
redshift data, although it does require wide and deep spectroscopic surveys that do 
not currently exist elsewhere on the sky apart from COSMOS. However, in the next 
few years, wide-field massively multiplexed fiber spectrographs deployed on 8m-class 
telescopes, such as the Subaru PFS [45] and VLT-MOONS [46], will carry out large- 
scale high-redshift surveys across angular footprints > 10x larger than the combined 
COSMOS data set that we used in our analysis. These upcoming surveys will allow 
constrained realization techniques to be even more effective in connecting cluster 
formation across cosmic time as well as the evolution of their constituent galaxies 
and gas. Constrained simulations of these upcoming high-redshift galaxy surveys will 
also allow us to probe early structure formation for consistency with the ACDM 
model with increasing sensitivity to lower mass galaxy (proto)clusters. Moreover, the 
identification of individual protoclusters (as presented in this work) combined with 
complementary studies such as X-ray and Lya observations will allow us to study gas 
accretion of star-forming galaxies within these protoclusters and can provide vital 
model constraints for high precision hydrodynamical simulations [8]. Each identified 
protocluster represents a unique environment to study the morphologies and merger 
rates of member galaxies in high-density environments at z > 2, which is, to date, 
only possible in theoretical studies or random cosmological simulations. 
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Supplementary Figure 5 Left: Galaxy positions on the sky for the four different surveys. 
The black crosses show the observationally reported overdensities. The blue diamonds are 
protoclusters identified in this work, dubbed as COSTCO. Right: Redshift distribution of 
the four surveys in the same colour code as the left plot. We chose a bin width of Az°Ps = 0.1. 
The grey dashed lines indicate the used redshift range. 


Methods 


Input Galaxy Surveys 


In this section, we summarize the galaxy catalogs that have been used in this study 
to compute the initial conditions. In total, we employed the data of four distinct 
spectroscopic galaxy redshift surveys, namely: 


The new release of the zCOSMOS-Deep [34] catalog, in which the astrometry 
has been updated to match the COSMOS2015 photometric galaxy catalog [48]. 
It also incorporates detailed visual inspections of the spectra that has led to 
more reliable redshift confidence flags compared to the original catalog (Lilly 
et al., 2021 in prep). 


The VIMOS Ultra Deep Survey (VUDS) [35] catalog, which was also covered 
substantial parts of the COSMOS field. The galaxies were originally selected 
for spectroscopy based on photometric redshifts from [50]. 


The final data release from MOSFIRE Deep Evolution Field (MOSDEF) Sur- 
vey [36], whose target selection is based on Hubble Space Telescope grism 
observations [52]. 


The KECK/MOSFIRE Spectroscopic Survey of Galaxies in Rich Environments 
at z ~ 2 (ZFIRE) [53], which is a spectroscopic follow-up of the ZFOURGE 
near-IR, medium-band imaging program [54]. 


Among the different surveys, we consider galaxies that are closer than 6 < 0.1” 
on the sky and additionally have a redshift separation of less than Age < 9.01 
as duplicates and remove the one with lower redshift confidence. The corresponding 
footprints and galaxy number density as a function of redshift are shown in Supple- 
mentary Figure 5. Further, we show the angular positions of the Literature Reported 
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Protocluster Candidates (LRPCs) and the additional COSTCO protocluster posi- 
tions (this work) as black crosses and blue diamonds, respectively. The redshift 
selection criteria and the calculations of the survey selection functions are described 
in detail in [33]. 


Initial Conditions 


In this section, we describe the initial conditions that were used to seed the 
constrained simulations. 

In contrast to standard simulations starting from randomly-generated Gaussian 
fluctuations, [56-59], we seek to find initial conditions that resemble the COSMOS 
galaxy distribution. 

First attempts to infer initial conditions from Gaussian random fields came into 
being nearly three decades ago [60-63] as starting point for cosmological simulations. 
Even though conceptually pioneering works, these methods suffered from low reso- 
lution and over-simplified assumptions, impeding further scientific applications. The 
technique has received renewed interest over the past decade thanks to advances in 
computing hardware as well as novel sampling techniques [30, 31, 37, 64]. 

In [33] the relevant initial Gaussian density field 6(q) for this work was recon- 
structed corresponding to the COSMOS field using a modified multi-survey version of 
the COSMIC BIRTH code [38], covering the observed redshift range 1.4 < z°P8 < 3.6. 
COSMIC BIRTH relies on the Hamiltonian Monte-Carlo (HMC) sampling [69] tech- 
nique, working on a discretized volume where each voxel of the reconstructed density 
field is a parameter of the posterior probability function. The algorithm estimates 
the posterior distribution of initial conditions that, after gravitationally evolving, 
would resemble the binned galaxy density field. The binned galaxy density field is 
therefore connected to the initial density field via a bias model, and the survey selec- 
tion function at each grid cell. This process is fully described in [33]. We computed 
five separate HMC chains with COSMIC BIRTH, starting at different initial seeds. 
Each chain ran at the minimum for ~ 10,000 iterations. After the burn-in phase, 
when the Markov Chain Monte Carlo (MCMC) reaches the stationary distribution, 
we randomly selected 50 individual realizations. We verified that the iterations were 
not closer than 500 MCMC steps within the chain to guarantee independent samples 
and accurately evaluate the posterior distribution. 

We perform the initial conditions inference in a cubical box with Dpo, = 
512 h- + Mpc comoving side length and a mesh grid of Nc = 2563, resulting a cell 
resolution of dj, = 2h Mpc. For the current analysis, we follow the same over- 
all method as in [33] but restrict the reconstruction region to a redshift range of 
2.00 < Ze < 2.52, in which a large number of reported protoclusters are located. 
We transform the coordinate system of the surveys so that they are aligned with the 
Cartesian axes of the COSMIC BIRTH box in the plane-parallel approximation and 
place the data region in the center of our reconstructed volume. The initial density 
fields are then calculated corresponding to a redshift of z = 100. 

To translate the initial density fields 6(q) into initial seeds for cosmological sim- 
ulations, we compute the three-dimensional white noise field wn(k) in Fourier space, 
defined as: 


wn(k) = (1) 
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Supplementary Figure 6 Pairs of matter density field slice plots showing the redshift 
snapshot of z = 2.3 (which is the mean redshift of the input galaxies) on the top panel and 
the z = 0 snapshot on the bottom. We signpost the galaxy positions with black dots on top 
of the z = 2.3 snapshot. We transformed the galaxy surveys to a coordinate frame where 
the line of sight direction is aligned with the abscissa. 


where 6(k) is the Fourier-transformed initial density field and P(k) is the theoretical 
three-dimensional ACDM power spectrum obtained with the CAMB code [70]. We 
read the white noise field into the MUSIC code (MUIti-Scale Initial Conditions) [57]. 
MUSIC places a dark matter particle in each cell of the initial density mesh grid and 
uses the density contrast to calculate the initial displacement and velocity field for 
each particle with second order Lagrangian perturbation theory. 


Constrained Simulation Methodology 


Over the last decade constrained simulations have helped to understand structure 
formation and the dynamics of the local Universe [71, 72], the cosmic density from 
Lyman-a tomographic observations [73], and also more recently to study massive 
galaxies [74] 

In this work, we extend this concept to Cosmic Noon by running N-body sim- 
ulations based on the COSMOS initial conditions introduced in section for initial 
conditions using the massively parallelized PKDGRAV3 N-body code [39], which 
utilizes the Fast Multipole Method for gravity calculations and has been tested in a 
2 trillion particle cosmological simulation. MUSIC returns the initial conditions files 
in the TIPSY format, that can be directly read in PKDGRAV3. In this study, we 
are mainly interested in the galaxy cluster evolution from the observed redshifts of 
the galaxy surveys, at z = 2.3, until z = 0. 

However, to precisely describe the observed structures over the time-evolution 
of their lightcones, we also output six PKDGRAV3 snapshots in the interval of z = 
2 — 2.5 with a separation of Az = 0.1. 

In Supplementary Figure 6 we show four different constrained simulations in pairs 
of z = 2.3 and z = 0. We also display the galaxy positions with black dots on top of 
the z = 2.3 snapshot. The density contrast of the z = 2.3 snapshot closely follows 
the galaxy distribution in all cases. Outside the observed region, i.e. ~ +20 io Mpc 
from the edges along line of sight and ~ +10h~! Mpc in transverse direction, the 
realizations have been substituted by a random field created from the ACDM power 
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spectrum. Differences in the small-scale structure of the realizations, in addition to 
the variation of random fields outside the observed area, lead to notably different final 
density fields at z = 0. However, we see consistently forming large-scale structure, 
such as the strong overdensity centered at 7) mw 2.1, and an extended filamen- 
tary structure centered at Gobo ~ 2.45. Other, less well constrained regions in the 
observed volume, show that the z = 0 outcome can be very different. For example, 
within the four z = 2.3 snapshot shown in Supplementary Figure 6, we can see dif- 
ferent fates of one of the overdensities at z°P® ~ 2.05: in one case, it collapses into 
a singular cluster at z = 0; in two cases becomes part of a wider overdense region; 
and in one case diffuses into a shallow density. Also, we can see that in the region 
between 2.12 < pers <S 2.18, a few density peaks consistently form at the z = 2.3 
snapshots. However, their final state can be a singular high density filament at z = 0, 
or, several aligned filaments that run in parallel to each other. Finally, it is worth 
mentioning that even though in all realizations the Hyperion supercluster is forming 
a massive filament, the internal structure changes. We can see a different number 
of member halos forming within the filament, mostly consisting of 4-5 massive halos 
(see also Figure 3 for a better visual impression). Future deep surveys with higher 
galaxy sampling compared to our dataset, in conjunction with improved modelling of 
redshift-space distortions and other effects, should allow more consistent predictions 
on the evolution of these structures. 

We also tested for a possible systematic influence of the simulations’ grid resolu- 
tion on the final results. To do so, we increase the resolution of our initial conditions 
from No = 256° to 1024° by augmenting small-scale modes in the Gaussian random 
field, while preserving the large-scale ones. This procedure is justified since we are 
interested in the most massive halos of these simulations, regardless of the halo sub- 
structures. The results of the high-resolution simulation is consistent with those of 
the low-resolution run performed with the same initial conditions, yielding clusters 
that agree to within ~ 2% in position and virial mass. This is clearly a subdominant 
effect as compared to the variance among the different realizations, and we consider 
the grid resolution of No = 256° to be converged for our purposes. The resemblance 
is shown in Supplementary Figure 7. 


Halo Analysis 


We identify halos from the PKDGRAV3 simulation snapshots using the ROCK- 
STAR phase-space halo finder [41]. Subsequently, we calculate the merger trees across 
the PKDGRAV3 snapshots using the consistent-trees package [77]. We then select 
halos at z = 0 that have a minimum virial mass of My;, > 2 x 10!4hA7! Me [9]. 
The theoretical ACDM halo mass function at z = 0 predicts Nyalos ~ 26 halos 
of Myiry > 2 x 10'4h~+ Mo in the considered COSMOS field volume [79] (see also 
Supplementary Figure 8). 

Next, we trace the z = 0 dark matter particles that ROCKSTAR associates with 
each halo back to z = 2.3, allowing us to self-consistently define a protocluster as 
the set of all progenitor dark matter particles of a z = 0 halo in its Lagrangian 
volume. The center-of-mass positions for each protocluster (in the observed volume) 
are shown with blue diamonds in the upper panel of Figure 1, while two examples of 
the virialized halos and their Lagrangian volumes are shown in Figure 2. 


Springer Nature 2021 ATX template 


Fate of COSMOS Protoclusters 19 


Observed redshift distance 2° 
21 2.2 2.3 


24 25 


Transverse distance{h~! Mpc] 





200 300 
Line of Sight distance [h~! Mpc] 
Observed redshift distance 2° 
22 2.3 


2) 21 24 2.5 


280. 


Transverse distance[h~! Mpc] 
a 
3 





200 300 
Line of Sight distance [h~! Mpc] 


Supplementary Figure 7 Comparison of the 256? (top) and the 1024° (bottom) simu- 
lation for the same initial conditions. We find a clear agreement of the large-scale structure 
of both simulations and a matching of cluster masses and positions with a ~ 2% accuracy. 


Matching the LRPCs 


Protocluster searches are a very important research topic for understanding structure 
formation at high redshifts [80-85]. 

In this section, we match the protocluster’s center-of-mass position to the LRPCs 
for all realizations by performing a nearest-neighbor search. For the choice of the 
search radius we must take into account the following conditions: 


® Cosmological displacements: 
The most massive halos, that grow in an isolated environment, grow nearly 
in-place with a final position at z = 0 that is nearly identical to the center- 
of-mass position of the protocluster. In contrast, less massive halos in dense 
environments show significant displacements from z = 2.3 to z = 0 that can be 
of order ~ 10h7! Mpc. 


@ Incomplete information: 

The observed galaxy catalogs contain limited information in two aspects. First, 
the phase-space distribution of the galaxies is incomplete, as we only have spa- 
tial information. The unknown peculiar velocities result in a distortion along 
line of sight, called redshift space distortions (RSD). Also, the velocity auto- 
correlation of galaxies at cosmological distances of r = 100h~! Mpc can still 
be high [86]. Consequently, structures outside of our observed volume have a 
significant influence on the velocities of our halos, called super-survey modes in 
the literature [87]. 


e Sampling variance: 
The different initial conditions inferred with the COSMIC BIRTH code are 
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individual Markov-Chain realizations drawn from a posterior probability dis- 
tribution. This chain variance depends on the statistical model, as well as on 
the marginalized nuisance parameters [33, 37, 38]. Therefore, we naturally 
expect a certain variance within the Markov-Chain samples. 


To account for these displacements we define a search radius of rgearch = 15 not Mpc 
around the LRPCs and scan for the center-of-mass positions of the z = 2.3 protoclus- 
ters in their vicinity. Details on the robustness and the justification for this search 
radius are given below. If multiple candidates are found, we choose the most mas- 
sive among them, or, if their Lagrangian volume overlaps at z = 2.3, we sum their 
masses at z = 0. If a matching of protoclusters and LRPCs is successful, we show 
the distance as white line in the upper panel of Figure 1. 


COSTCO Protoclusters 


Besides identifying the LRPCs that were previously known, we look for other recur- 
ringly forming halos within the COSTCO suite. For this purpose, we stack all halos 
of the COSTCO multiverse into one volume and apply the Density-Based Spatial 
Clustering of Applications with Noise (DBSCAN) algorithm, implemented in 
the Scikit-learn package. This algorithm identifies clustered points given the halo 
density of the COSTCO multiverse and allows to model noise of the point cloud with 
an additional parameter. To tune this noise parameter, we first apply DBSCAN to 
the LRPCs only. Once we find a best fit setup such that the LRPCs are robustly 
identified, we apply the exact same scheme to the remaining volume and look for 
additional clusters in the COSTCO multiverse. We then use the center-of-mass posi- 
tion of these newly identified clusters as anchor points and scan the 50 realizations 
at these positions for massive halos. This has also been studied with the MIP suite 
using a signal-to-noise ratio [40]. MIP is a set of correlated cosmological simulations 
and is based on random initial conditions. The initial conditions of the MIP real- 
izations are consistent in their large-scale Fourier modes, but were randomized on 
the small scales. This represents an interesting analogy to the constrained COSTCO 
simulations. We show the corresponding confidence map of the identified protoclus- 
ters on the top panel of Supplementary Figure 4 for all 50 realizations stacked into 
the same volume. We write the names of the LRPCs and COSTCO protoclusters 
(the same order as in Table 1) on top of the identified regions. It is clearly notice- 
able that the ZFIRE and Hyperion structures show the strongest significance, while 
the other structures have more extended and less distinct contour regions, which 
means that the uncertainty of these structures is higher. Nevertheless, all identified 
COSTCO protoclusters form in the majority of our realizations. This shows that 
constrained simulations are a well-suited method for the identification of less massive 
protoclusters, that would be less efficient with other methods. 

The bottom panel of Figure 1 shows the z = 0 snapshot and the two kinds of 
encircled halos. White circles indicate halos for which a protocluster-LRPC matching 
was successful. This means that the white circles represent the predicted z = 0 state 
of the LRPCs. Blue circles show massive halos that could not be linked to a LRPC 
partner and are newly discovered by our analysis. 

On the bottom panel of Figure 4 we show the mean density of all 50 COSTCO 
z = 2.3 snapshots. While densities outside the observed volume average out, we see 
an excellent agreement of the averaged structures and galaxy distribution within 
the constrained volume. Interestingly, on average COSTCO V arises from a stronger 
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overdensity than COSTCO II, but does not evolve to a higher mass than COSTCO II 
(Table 1). The main difference of these two structures lies in their respective large- 
scale environments: COSTCO II forms in a more isolated environment and can 
accrete surrounding matter without disruption, whereas COSTCO V is within the 
vicinity of the Hyperion filament /supercluster and likely experiences some form of 
large-scale mass stripping by its more massive neighbor. This stresses the importance 
of the large-scale environment for determining the evolution of a protocluster rather 
than purely relying on local galaxy number density. 


Robustness of the Halo Identifications 


We asses the robustness of our protocluster identifications. In particular, we want to 
know the probability of finding a massive halos at any given fixed position in a search 
radius of rgearch = 15 not Mpc within our simulations by chance. The theoretical 
ACDM halo mass function predicts about ny ~ 8 x 107° h® Mpc? halos per unit 
volume with M,j,; > 2 x 104 p-! Mo. Thus, we estimate the expected number of 
these halos in a volume of Vy = 4/3 Tacarch to be Vy X ny ~ 0.1. Thus, assuming 
that the halo distribution in 50 separate volumes is an independent and identically 
distributed random variable, we expect 5 halo detections in this search radius by 
chance. 

To confirm this estimate numerically, we ran five simulations with the same reso- 
lution and same set of cosmological parameters as COSTCO, but with random initial 
conditions. We then chose 10 fixed positions inside these random simulations and 
searched for My; > 2x10'4 p-! Mo halos in a radius of rgearch = 15 h—! Mpc, ensur- 
ing that these points were separated by at least 100 h~+ Mpc. On average we found 
~ 0.5 halos per position. Extrapolating this number to 50 realizations, we therefore 
expect to find ~ 5 halos by chance at a random position inside the COSTCO suite. 
This agrees with the analytical estimate from the halo mass function. 

Relying on Poisson statistics, we consider a COSTCO halo to be a robust 50 
detection, if its counterparts are found in at least > 27.5 realizations at the same 
position given the search radius of 15 nt Mpc. 

All our claimed COSTCO protoclusters shown in Table 1 fulfill this detection 
threshold, except for COSTCO J095945.1+020528 which was found in 27 out of 50 
realizations (4.90), but we have nevertheless decided to include it as a detection. 
Note that apart from this object and the other listed protoclusters, the next most 
significant candidate is found in only 8 out of 50 realizations (1.450). This further 
validates our decision to include COSTCO J095945.1+020528, since it clearly rises 
above the random background noise. 

For the robust identification of halos with smaller masses, we would need to 
shrink the search radius in order not to be contaminated by random detections. 
However, this requires a better knowledge of the z = 2.3 to z = 0 displacement 
field, which in turn requires a higher number density of tracers that constrains the 
initial conditions. In a follow-up paper, we will forecast the sensitivity of our methods 
toward lower-mass clusters in future high-redshift spectroscopic surveys. 


COSTCO Halo Mass Function 


We compare the halo mass functions (HMF) from the COSTCO realizations to the 
five random simulations and also to the theoretical HMF that we calculated with 
the HMFcalc package. As we can see in Supplementary Figure 8, the COSTCO as 
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Supplementary Figure 8 Mean halo mass function of the COSTCO runs (blue dashed 
line), five simulations with random initial conditions (orange dashed line) and the theoretical 
prediction (red line). Error bars in the same color as solid lines. 


well as the random initial conditions simulations show close agreement with the 
theoretical HMF within the uncertainties. Also we can see that the high mass end 
at Mualo > 101° ho? Mo gets very noisy due to the low number of halos. 

Within the statistical uncertainty of the HMF, we can see that the COSTCO 
simulations suite is consistent with the ACDM predictions. However, in contrast to 
the random simulations, the COSTCO HMF shows two bumps in the high-mass tail 
that slightly exceed the random simulations. We find that these bumps are caused by 
the ZFIRE and Hyperion cluster and may hint towards an overabundance of massive 
halos in the studied volume. We will the study the significance of the massive halos 
in more detail in a subsequent work. 


Effective Scale of the COSTCO Realizations 


The COSTCO simulation ensemble shares the same large-scale structure, while small 
scales can notably be different. Therefore, we evaluate the effective scale on which 
the different realizations are correlated. 

This is done as follows. 


@ We compute the mean of all 50 COSTCO z = 2.3 snapshots (see Supplemen- 
tary Figure 4 bottom panel). We select a rectangular volume that contains the 
observed region. 


@ We apply Gaussian smoothing with kernel sizes of rg = 3 — 6h +Mpc on a 
simulation with random initial conditions and select the same volume. 


@ We compute the power spectra of the COSTCO mean field and the variously 
smoothed random simulation in the selected volume. 


We show the resulting power spectra in Supplementary Figure 9. The dashed 
black line shows the power spectrum for the COSTCO mean field, while the solid 
lines show the different smoothing scales of the reference simulation that we ran with 
random initial conditions. Comparing the power spectra we can directly read off, that 
the COSTCO simulations are effectively correlated on scales of rg ~ 4.5 h-! Mpe 
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Supplementary Figure 9 Assessment of the effective scale calculated from the COSTCO 
simulations with the power spectra of the COSTCO mean field (dashed black line) and 
the smoothed reference simulation with random initial conditions (solid colored lines). The 


COSTCO realizations agree on scales that correspond to a Gaussian smoothing of rg ~ 
4.5h—-! Mpc. 


(compare green and orange solid line) while fluctuating for smaller scales. The corre- 
lated scales are well suited to analyze the formation and fate of galaxy protoclusters, 
focusing on the final masses and positions. 


Software 


This research made use of matplotlib [89], SciPy [90], Scikit-learn [91], Astropy [92, 
93], HMFcalc [79], Dark Emulator [94], pynbody [95], and ytree [96]. 
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